Tailoring data source distributions for fairness-aware data integration

نویسندگان

چکیده

Data scientists often develop data sets for analysis by drawing upon sources of available to them. A major challenge is ensure that the set used has an appropriate representation relevant (demographic) groups: it meets desired distribution requirements. Whether collected through some experiment or obtained from provider, any single source may not meet Therefore, a union multiple required. In this paper, we study how acquire such in most cost effective manner, typical functions observed practice. We present optimal solution binary groups when underlying distributions are known and all have equal costs. For generic case with unequal costs, design approximation algorithm performs well When unknown, exploration-exploitation based strategy reward function captures approximations group each source. Besides theoretical analysis, conduct comprehensive experiments confirm effectiveness our algorithms.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Concern Aware Querying for the Integration of Data Services

There is an increasing trend for organizations to publish data over the web using data services. The published data is often associated with data concerns like privacy, licensing, pricing, quality of data, etc. This raises several new challenges. For instance, it must be ensured that data consumers utilize the data in the right way and are bound to the rules and regulations defined by the data ...

متن کامل

Data Source Selection for Information Integration in Big Data Era

In Big data era, information integration often requires abundant data extracted from massive data sources. Due to a large number of data sources, data source selection plays a crucial role in information integration, since it is costly and even impossible to access all data sources. Data Source selection should consider both efficiency and effectiveness issues. For efficiency, the approach shou...

متن کامل

Learning Source Description for Data Integration

To build a data-integration system, the application designer must specify a mediated schema and supply the descriptions of data sources. A source description contains a source schema that describes the content of the source, and a mapping between the corresponding elements of the source schema and the mediated schema. Manually constructing these mappings is both labor-intensive and error-prone,...

متن کامل

Source Integration in Data Warehousing

Source Integration is one of the core problems in Data Warehousing. Two critical factors for the design and maintenance of applications requiring Source Integration, and in particular Data Warehouse applications, are conceptual modeling of the domain, and reasoning support over the conceptual representation. We present a novel approach to conceptual modeling for Source Integration, which allows...

متن کامل

Learning Source Descriptions for Data Integration

To build a data-integration system, the application designer must specify a mediated schema and supply the descriptions of data sources. A source description contains a source schema that describes the content of the source, and a mapping between the corresponding elements of the source schema and the mediated schema. Manually constructing these mappings is both labor-intensive and error-prone,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the VLDB Endowment

سال: 2021

ISSN: ['2150-8097']

DOI: https://doi.org/10.14778/3476249.3476299